These are very first set of syntax in Python programming. These are concepts that we need to know and use in a repeatable manner in future work.
The goal of this class is to begin writing syntactically correct snippets of code as a precursor to doing data analysis.
# Prints like a computer
course_num= 'BAUN493'
class_size= 13
pct_A= .9
avg_score= 96.58154
#Print a charcater string
print('Course Number:', course_num)
print('Class Size:', class_size)
print('Average Class Score:', avg_score)
print('% Scored a A or better:',pct_A)
Course Number: BAUN493 Class Size: 13 Average Class Score: 96.58154 % Scored a A or better: 0.9
Gives you lot more flexibility to output results almost as in the English language.
The following are a few examples that you are instructive and you should try and use in your work
# Prints like English
course_num= 'BAUN493'
class_size= 13
pct_A= .9
avg_score= 96.58154
print('Course Number {0:s} has a class size of {1:d} with {2:.2%} \
scoring over an A with an Average Raw Score of {3:.2f}'.\
format(course_num, class_size, pct_A, avg_score))
Course Number BAUN493 has a class size of 13 with 90.00% scoring over an A with an Average Raw Score of 96.58
Observations:
These are commonly used output formating in the course. For more illustrations, you may look here
In Python, a variable is created the moment you first assign a value to it.
When we run the code x = 3
, the value 3
is saved in the computer memory. The computer memory has large number of storage locations, and 3
is saved to one particular location.
3
has a unique identifier, and we can use it to access 3
. The identifier is named x
, and we named it that way when we ran the code x = 3
x=3
y='Hello World'
# Syntactically correct way of defining 2 variables in one line of code. Not really a great idea
x=3;y='Hello World'
The boolean data type is either True
or False
. In Python, boolean variables are defined by the True
and False
keywords.
The output <class 'bool'>
indicates the variable is a boolean data type.
Note the keywords True
and False
must have an Upper Case first letter. Using a lowercase true returns an error
.
x = True # True and False are case senstitve
y = False
print(type(x))
<class 'str'>
x= true
--------------------------------------------------------------------------- NameError Traceback (most recent call last) <ipython-input-59-9b2be4a47841> in <module>() ----> 1 x= true NameError: name 'true' is not defined
There are three numeric types in Python:
int
float
complex
Variables of numeric types are created when you assign a value to them
x=1 ; y=2.5
print(type(x))
<class 'int'>
print(type(y))
<class 'float'>
myString = "This is a test!!"
print(myString[0]) #produces 'T'
print(myString[1]) #produces 'h'
print(myString[2]) #produces 'i'
print(myString[3]) #produces 's'
print(myString[4]) #Guess what!!
T h i s
' '
are considered as a character as wellprint(len(myString))
16
myString[15]
'!'
Knowing a length of a string can be very handy when you are trying to manipulate it in a loop
print(myString)
print('================')
print('Last character: ', myString[-1])
print('2nd last character: ',myString[-2])
print('3rd last character: ',myString[-3])
print('4th from last: ',myString[-4])
print('First character: ', myString[-len(myString)])
This is a test!! ================ Last character: ! 2nd last character: ! 3rd last character: t 4th from last: s First character: T
print('1. Variable myString: ', myString, x)
1. Variable myString: This is a test!! True
print('1. Variable myString: ', myString)
print('2. Starts at 0 and prints until index 2: ',myString[0:3])
print('3. Starts at 3 and prints until index 8: ',myString[3:9])
print('4. Starts at 3 and prints until index 8, skipping every 2nd caharacter: ',myString[3:9:2])
print('5. Leaving start blank, assumes 0: ',myString[:3] )
print('6. Leaving assumes you want until the end of the string: ', myString[3:])
print('7. Leaving both ends blank reproduces the string: ', myString [ : ])
print('8. PALINDROME- reverses the string: ', myString[: : -1])
1. Variable myString: This is a test!! 2. Starts at 0 and prints until index 2: Thi 3. Starts at 3 and prints until index 8: s is a 4. Starts at 3 and prints until index 8, skipping every 2nd caharacter: si 5. Leaving start blank, assumes 0: Thi 6. Leaving assumes you want until the end of the string: s is a test!! 7. Leaving both ends blank reproduces the string: This is a test!! 8. PALINDROME- reverses the string: !!tset a si sihT
Tuples are sequences of objects, just like lists.
Tuples are immutable i.e. you cannot modify tuples.
T1 = (5,4,2,6,7)
print(T1)
print(type(T1))
T2= 5,4,2,6,7
print(T2)
print(type(T2))
x1 = (5)
print(x1)
print(type(x1))
T2= (5,) # Tuple with only single element is defined by specifying item with comma within an optional pair of parentheses.
print(T2)
print(type(T2))
(5, 4, 2, 6, 7) <class 'tuple'> (5, 4, 2, 6, 7) <class 'tuple'> 5 <class 'int'> (5,) <class 'tuple'>
Tuples are IMMUTABLE
T1= (5,4,2,6,7)
print(T1)
T1[2] = 4
(5, 4, 2, 6, 7)
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-87-9bdf4fec2945> in <module>() 1 T1= (5,4,2,6,7) 2 print(T1) ----> 3 T1[2] = 4 TypeError: 'tuple' object does not support item assignment
T1 = [(2,3),(4,5),('b',7),(8,9)]
print(type(T1))
for t in T1:
print (t)
print(type(t))
<class 'list'> (2, 3) <class 'tuple'> (4, 5) <class 'tuple'> ('b', 7) <class 'tuple'> (8, 9) <class 'tuple'>
T1= (2,'b',3.2,10,(4,2))
print(T1)
i1,i2,i3,i4,i5 = T1
print(i1,i2,i3,i4,i5)
print(i5)
print(type(i5))
(2, 'b', 3.2, 10, (4, 2)) 2 b 3.2 10 (4, 2) (4, 2) <class 'tuple'>
Note: We shall now take a short detour into python control structures and loops before going any further with data types as the Python control structures are essential in manipulating and observing these
Two important control structures:
Analyst can choose the statement that is most useful for the given circumstance.
from IPython.display import Image
Image("https://media.geeksforgeeks.org/wp-content/uploads/20191122131748/Python-logical-and-operator2.jpg")
a = 3
if a <= 5:
print("Condition is True")
Condition is True
a = 0
print ("Another Statement")
if a == 0:
print("I am in IF Block")
else:
print("I am in ELSE Block")
Another Statement I am in IF Block
var = 4900
if (var < 200 and var >= 50):
print ("Expression value is less than 200 & greater than 50")
if var == 150:
print ("Which is 150")
elif var == 100:
print ("Which is 100")
elif var == 50:
print ("Which is 50")
elif var < 50:
print ("Expression value is less than 50")
else:
print ("Could not find true expression")
Could not find true expression
Note:
==
: For testing equavalency<
: Less than>=
: Greater than or equal tofrom IPython.display import Image
Image("https://media.geeksforgeeks.org/wp-content/uploads/20191101172216/for-loop-python.jpg")
n = 12
for i in range(2,n):
print(i)
2 3 4 5 6 7 8 9 10 11
The range() function returns a sequence of numbers, starting from 0 by default, and increments by 1 (by default), and stops before a specified number.
range(start, stop, step)
from IPython.display import Image
Image("https://media.geeksforgeeks.org/wp-content/uploads/20191101170515/while-loop.jpg")
n = 5
i = 1
while(i <= n):
print(i)
i = i + 1
1 2 3 4 5
myString
'This is a test!!'
for myVar in myString:
# print(myVar)
if myVar == 'T':
print ("I found a T!")
I found a T!
for letter in 'Django': # First Example
if letter == 'D':
continue
print ('Current Letter:', letter)
Current Letter: j Current Letter: a Current Letter: n Current Letter: g Current Letter: o
for letter in 'Python': # First Example
if letter == 'h':
break
print ('Current Letter :', letter)
print ('Out of For')
Current Letter : P Current Letter : y Current Letter : t Out of For
List:
A list is a data structure that holds an ordered collection of items i.e. you can store a sequence of items in a list.
The list of items should be enclosed in square brackets so that Python understands that you are specifying a list.
Once you have created a list, you can add, remove or search for items in the list.
Since we can add and remove items, we say that a list is a mutable data type i.e. this type can be altered.
x = [1,2,3]
print(type(x))
<class 'list'>
y = [10.5,23.45,34.56,45.78]
print(y)
[10.5, 23.45, 34.56, 45.78]
z = ['q','r','s','t']
print (z)
['q', 'r', 's', 't']
x = [10,25,10,38,25]
print(x)
[10, 25, 10, 38, 25]
x = [10,25,12,38,24]
print(x[0])
print(x[3])
#print (x[5])
10 38
x = [10,25,10,38,25]
print(x[-1])
25
len(x)
5
a = []
print(a)
b=list()
print(b)
[] []
x = [10,25,10,38,25]
print(x[1])
x[1] = 5
print(x[1])
print(x)
25 5 [10, 5, 10, 38, 25]
List assignment is used to assign all the elements of list to another list.
a = [2,4,6]
c = list(a)
print(c)
c[0] = 10
print(c)
print(a)
[2, 4, 6] [10, 4, 6] [2, 4, 6]
a = [2,4,6]
b = a
print(b)
b[1] = 12
print(a ,b)
[2, 4, 6] [2, 12, 6] [2, 12, 6]
a = [2,4,6]
b = [3,2,5]
print (a[1]==b[1])
print (a[0]==b[1])
print (a==b)
False True False
a = [0,1,2,3,4,5,6,7,8,9,10]
print(a[1:4])
print(a[:6])
print(a[3:])
print(a[2:8:2])
print(a[2:10:2])
[1, 2, 3] [0, 1, 2, 3, 4, 5] [3, 4, 5, 6, 7, 8, 9, 10] [2, 4, 6] [2, 4, 6, 8]
List can have list itself as its elements.
x = [[2,4,6] ,[3,2,5]]
print (x)
for i in x:
print(i)
for j in i :
print(j)
#### Print individual elements for following list.
y = [[[2],[2,3,4],[6,5]] ,[[3],[2,5]]]
[[2, 4, 6], [3, 2, 5]] [2, 4, 6] 2 4 6 [3, 2, 5] 3 2 5
List comprehensions are used for creating new list from another iterables.
As list comprehension returns list, they consists of brackets containing the expression which needs to be executed for each element along with the for loop to iterate over each element.
Basic syntax:
new_list = [expression for_loop_one_or_more condtions]
from IPython.display import Image
Image("https://4.bp.blogspot.com/-uRPZqKbIGwQ/XRtgWhC6qqI/AAAAAAAAH0w/--oGnwKsnpo00GwQgH2gV3RPwHwK8uONgCLcBGAs/s1600/comprehension.PNG")
numbers = [1,2,3,4]
squares=[]
for i in numbers:
squares.append(i**2)
print(squares)
[1, 4, 9, 16]
Above loop in the form of a list comprehension
squares2 = [i**2 for i in numbers]
print(squares2)
[1, 4, 9, 16]
Finding common numbers in 2 list
# Find common numbers from two list using for loop.
list_a = [1, 2, 3, 4]
list_b = [2, 3, 4, 5]
common_num = [a for a in list_a for b in list_b if a == b]
print(common_num) # Output: [2, 3, 4]
[2, 3, 4]
Creating a nested list
list_a = [1, 2, 3]
list_b = [2, 7]
different_num = [(a, b) for a in list_a for b in list_b if a != b]
print(different_num) # Output: [(1, 2), (1, 7), (2, 7), (3, 2), (3, 7)]
[(1, 2), (1, 7), (2, 7), (3, 2), (3, 7)]
Pairs of keys and values are specified in a dictionary by using the notation
d = {key1 : value1, key2 : value2 }
Notice that the key-value pairs are separated by a colon and the pairs are separated themselves by commas and all this is enclosed in a pair of curly braces.
Remember that key-value pairs in a dictionary are not ordered in any manner. If you want a particular order, then you will have to sort them yourself before using it.
The dictionaries that we will be using are instances/objects of the dict class.
Variable assigned with empty curly braces will have data type "dict". But variable assigned with a single value will have datatype "set". Refer variable d1 and d3 respectively.
d1 = {}
print (d1)
print(type(d1))
{} <class 'dict'>
d2 = dict()
print (d2)
{}
d3 = {3}
print (d3)
print (type(d3))
{3} <class 'set'>
d1 = {'a' : 3 ,'b':4}
print (d1)
print (type(d1))
{'a': 3, 'b': 4} <class 'dict'>
d1['b']
4
Creating Dictionary using list of tuples.
d1 = dict([('a',3), ('b',5)])
print (d1)
{'a': 3, 'b': 5}
Creating Dictionary using list of lists.
d2= dict([['a',3], ['b',5]])
print (d2)
{'a': 3, 'b': 5}
d3= {'result1':[23,45,60], 'result2':[34,45,67]}
d = {1:'a' , 2:'b'}
print (d)
d[5] = 'g'
print(d)
{1: 'a', 2: 'b'} {1: 'a', 2: 'b', 5: 'g'}
d = {1:'a' , 2:'b'}
print (d)
d[1] = 'g'
print(d)
{1: 'a', 2: 'b'} {1: 'g', 2: 'b'}
print(d[1]) # getting values using keys
g
.values(), .keys(), .items()
print(d.values()) # this returns list of values
dict_values(['g', 'b'])
print(d.keys()) # this returns list of keys
dict_keys([1, 2])
print(d.items()) # this dictionary in list of tuple form
dict_items([(1, 'g'), (2, 'b')])
Keys has to be immutable data types only Tuples, Strings are immutable datatypes so they can be used for keys. Lists, Sets are mutable datatypes so they cannot be used for keys.
# Keys should always be Unique
d = {1:'a', 1:'b'}
print (d)
{1: 'b'}
d = {1:'a', 2:'b'}
print (d)
d = {'a': 2, 'b' : 3}
print (d)
{1: 'a', 2: 'b'} {'a': 2, 'b': 3}
d = {(1,3):'a', 1:'b'}
print (d)
d1 = {[1,3]:'a', 1:'b'}
print (d1)
# Since List is not Immutable, it cannot be used as a key
{(1, 3): 'a', 1: 'b'}
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-155-19023ff5733e> in <module>() 1 d = {(1,3):'a', 1:'b'} 2 print (d) ----> 3 d1 = {[1,3]:'a', 1:'b'} 4 print (d1) 5 # Since List is not Immutable, it cannot be used as a key TypeError: unhashable type: 'list'
def print_max(a, b):
if a > b:
print (a, 'is maximum')
elif a == b:
print (a, 'is equal to', b)
else:
print (a, 'is maximum')
# directly pass literal values
print_max(3, 4) # calling the function
x = 5
y = 7
# pass variables as arguments
print_max(x, y)
3 is maximum 5 is maximum
Python has a nifty feature called documentation strings, usually referred to by its shorter name docstrings.
DocStrings are an important tool that you should make use of since it helps to document the program better and makes it easier to understand.
def print_max(x, y):
'''Prints the maximum of two numbers.
The two values must be integers.
Input Arguments: (x,y) 2 real numbers
Output: None
Effect: The maximum of the 2 is printed'''
# convert to integers, if possible
x = int(x)
y = int(y)
if x > y:
print (x, 'is maximum')
else:
print (y, 'is maximum')
print (print_max.__doc__)
Prints the maximum of two numbers. The two values must be integers. Input Arguments: (x,y) 2 real numbers Output: None Effect: The maximum of the 2 is printed
def say(message, times=3):
print (message * times)
say('Hello ')
say('World ', 5)
Hello Hello Hello World World World World World
def say1(message='Hi ', times=3):
print (message * times)
say1('Hello ')
Hello Hello Hello
say1('World ', 5)
World World World World World
say1()
Hi Hi Hi
- Only those parameters which are at the end of the parameter list can be given default argument values i.e. you cannot have a parameter with a default argument value preceding a parameter without a default argument value in the function’s parameter list.
########################
def say2(message='Hi ', times):
print (message * times)
say2('Hello ')
say2('World ', 5)
File "<ipython-input-174-efc8a6f8cac7>", line 3 def say2(message='Hi ', times): ^ SyntaxError: non-default argument follows default argument
def add(x,y):
print(x+y)
When there is no return statement in the function, it returns None
ans = add(2,3)
print(ans)
5 None
But when we use return statements the the function will return the appropriate return value
def add2(x,y):
z= x+y
return(z)
ans2 = add2(2,3)
print(ans2)
5
# Python program to count the frequency of
# elements in a list using a dictionary
def CountFrequency(my_list):
# Creating an empty dictionary
freq = {}
for item in my_list:
if (item in freq):
freq[item] += 1
else:
freq[item] = 1
for key, value in freq.items():
print ("% d : % d"%(key, value))
# Driver function
if __name__ == "__main__":
my_list =[1, 1, 1, 5, 5, 3, 1, 3, 3, 1, 4, 4, 4, 2, 2, 2, 2]
CountFrequency(my_list)
1 : 5 5 : 2 3 : 3 4 : 3 2 : 4
The basic import statement is executed in two steps:
- find a module, loading and initializing it if necessary
- define a name or names in the local namespace for the scope where the import statement occurs.
When the statement contains multiple clauses (separated by commas) the two steps are carried out separately for each clause, just as though the clauses had been separated out into individual import statements.
Python code in one module gains access to the code in another module by the process of importing it. It can be done by using import keyword.
import pandas
help(pandas)
Help on package pandas: NAME pandas DESCRIPTION pandas - a powerful data analysis and manipulation library for Python ===================================================================== **pandas** is a Python package providing fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, **real world** data analysis in Python. Additionally, it has the broader goal of becoming **the most powerful and flexible open source data analysis / manipulation tool available in any language**. It is already well on its way toward this goal. Main Features ------------- Here are just a few of the things that pandas does well: - Easy handling of missing data in floating point as well as non-floating point data. - Size mutability: columns can be inserted and deleted from DataFrame and higher dimensional objects - Automatic and explicit data alignment: objects can be explicitly aligned to a set of labels, or the user can simply ignore the labels and let `Series`, `DataFrame`, etc. automatically align the data for you in computations. - Powerful, flexible group by functionality to perform split-apply-combine operations on data sets, for both aggregating and transforming data. - Make it easy to convert ragged, differently-indexed data in other Python and NumPy data structures into DataFrame objects. - Intelligent label-based slicing, fancy indexing, and subsetting of large data sets. - Intuitive merging and joining data sets. - Flexible reshaping and pivoting of data sets. - Hierarchical labeling of axes (possible to have multiple labels per tick). - Robust IO tools for loading data from flat files (CSV and delimited), Excel files, databases, and saving/loading data from the ultrafast HDF5 format. - Time series-specific functionality: date range generation and frequency conversion, moving window statistics, date shifting and lagging. PACKAGE CONTENTS _config (package) _libs (package) _testing _typing _version api (package) arrays (package) compat (package) conftest core (package) errors (package) io (package) plotting (package) testing tests (package) tseries (package) util (package) SUBMODULES _hashtable _lib _tslib offsets CLASSES builtins.object Panel SparseDataFrame SparseSeries __DatetimeSub __SparseArraySub class Panel(builtins.object) | Data descriptors defined here: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined) SparseArray = class __SparseArraySub(builtins.object) | Methods defined here: | | emit_warning(dummy=0) | | ---------------------------------------------------------------------- | Static methods defined here: | | __new__(cls, *args, **kwargs) | Create and return a new object. See help(type) for accurate signature. | | ---------------------------------------------------------------------- | Data descriptors defined here: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined) class SparseDataFrame(builtins.object) | Data descriptors defined here: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined) class SparseSeries(builtins.object) | Data descriptors defined here: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined) datetime = class __DatetimeSub(builtins.object) | Methods defined here: | | emit_warning(dummy=0) | | ---------------------------------------------------------------------- | Static methods defined here: | | __new__(cls, *args, **kwargs) | Create and return a new object. See help(type) for accurate signature. | | ---------------------------------------------------------------------- | Data descriptors defined here: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined) DATA IndexSlice = <pandas.core.indexing._IndexSlice object> NA = <NA> NaT = NaT __docformat__ = 'restructuredtext' __git_version__ = 'b5958ee1999e9aead1938c0bba2b674378807b3d' describe_option = <pandas._config.config.CallableDynamicDoc object> get_option = <pandas._config.config.CallableDynamicDoc object> np = <pandas.__numpy object> options = <pandas._config.config.DictWrapper object> reset_option = <pandas._config.config.CallableDynamicDoc object> set_option = <pandas._config.config.CallableDynamicDoc object> VERSION 1.1.5 FILE /usr/local/lib/python3.6/dist-packages/pandas/__init__.py
/usr/lib/python3.6/inspect.py:441: FutureWarning: The pandas.datetime class is deprecated and will be removed from pandas in a future version. Import from datetime instead. srch_obj = srch_cls.__getattr__(cls, name) /usr/lib/python3.6/pydoc.py:220: FutureWarning: The pandas.datetime class is deprecated and will be removed from pandas in a future version. Import from datetime instead. fields = getattr(object, '_fields', [])
import pandas as pd